{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. 对连续型特征，可以用哪个函数可视化其分布？（给出你最常用的一个即可），并根据代码运行结果给出示例。 \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可视化分布有几种图形可选：直方图，箱体图。\n",
    "选取课上简介的常用函数seaborn的displot方法作为常用(能用seaborn就用seaborn,然后再考虑matplot)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CRIM</th>\n",
       "      <th>ZN</th>\n",
       "      <th>INDUS</th>\n",
       "      <th>CHAS</th>\n",
       "      <th>NOX</th>\n",
       "      <th>RM</th>\n",
       "      <th>AGE</th>\n",
       "      <th>DIS</th>\n",
       "      <th>TAX</th>\n",
       "      <th>PTRATIO</th>\n",
       "      <th>...</th>\n",
       "      <th>RAD_2</th>\n",
       "      <th>RAD_3</th>\n",
       "      <th>RAD_4</th>\n",
       "      <th>RAD_5</th>\n",
       "      <th>RAD_6</th>\n",
       "      <th>RAD_7</th>\n",
       "      <th>RAD_8</th>\n",
       "      <th>RAD_24</th>\n",
       "      <th>MEDV</th>\n",
       "      <th>log_MEDV</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>-0.419782</td>\n",
       "      <td>0.285654</td>\n",
       "      <td>-1.287909</td>\n",
       "      <td>-0.272599</td>\n",
       "      <td>-0.144217</td>\n",
       "      <td>0.413672</td>\n",
       "      <td>-0.120013</td>\n",
       "      <td>0.140214</td>\n",
       "      <td>-0.666608</td>\n",
       "      <td>-1.353192</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.159686</td>\n",
       "      <td>0.345176</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>-0.417339</td>\n",
       "      <td>-0.487292</td>\n",
       "      <td>-0.593381</td>\n",
       "      <td>-0.272599</td>\n",
       "      <td>-0.740262</td>\n",
       "      <td>0.194274</td>\n",
       "      <td>0.367166</td>\n",
       "      <td>0.557160</td>\n",
       "      <td>-0.987329</td>\n",
       "      <td>-0.475352</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>-0.101524</td>\n",
       "      <td>0.084104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>-0.417342</td>\n",
       "      <td>-0.487292</td>\n",
       "      <td>-0.593381</td>\n",
       "      <td>-0.272599</td>\n",
       "      <td>-0.740262</td>\n",
       "      <td>1.282714</td>\n",
       "      <td>-0.265812</td>\n",
       "      <td>0.557160</td>\n",
       "      <td>-0.987329</td>\n",
       "      <td>-0.475352</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1.324247</td>\n",
       "      <td>1.266776</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>-0.416750</td>\n",
       "      <td>-0.487292</td>\n",
       "      <td>-1.306878</td>\n",
       "      <td>-0.272599</td>\n",
       "      <td>-0.835284</td>\n",
       "      <td>1.016303</td>\n",
       "      <td>-0.809889</td>\n",
       "      <td>1.077737</td>\n",
       "      <td>-1.106115</td>\n",
       "      <td>-0.036432</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1.182758</td>\n",
       "      <td>1.170822</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>-0.412482</td>\n",
       "      <td>-0.487292</td>\n",
       "      <td>-1.306878</td>\n",
       "      <td>-0.272599</td>\n",
       "      <td>-0.835284</td>\n",
       "      <td>1.228577</td>\n",
       "      <td>-0.511180</td>\n",
       "      <td>1.077737</td>\n",
       "      <td>-1.106115</td>\n",
       "      <td>-0.036432</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1.487503</td>\n",
       "      <td>1.373242</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 23 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       CRIM        ZN     INDUS      CHAS       NOX        RM       AGE  \\\n",
       "0 -0.419782  0.285654 -1.287909 -0.272599 -0.144217  0.413672 -0.120013   \n",
       "1 -0.417339 -0.487292 -0.593381 -0.272599 -0.740262  0.194274  0.367166   \n",
       "2 -0.417342 -0.487292 -0.593381 -0.272599 -0.740262  1.282714 -0.265812   \n",
       "3 -0.416750 -0.487292 -1.306878 -0.272599 -0.835284  1.016303 -0.809889   \n",
       "4 -0.412482 -0.487292 -1.306878 -0.272599 -0.835284  1.228577 -0.511180   \n",
       "\n",
       "        DIS       TAX   PTRATIO  ...  RAD_2  RAD_3  RAD_4  RAD_5  RAD_6  \\\n",
       "0  0.140214 -0.666608 -1.353192  ...      0      0      0      0      0   \n",
       "1  0.557160 -0.987329 -0.475352  ...      1      0      0      0      0   \n",
       "2  0.557160 -0.987329 -0.475352  ...      1      0      0      0      0   \n",
       "3  1.077737 -1.106115 -0.036432  ...      0      1      0      0      0   \n",
       "4  1.077737 -1.106115 -0.036432  ...      0      1      0      0      0   \n",
       "\n",
       "   RAD_7  RAD_8  RAD_24      MEDV  log_MEDV  \n",
       "0      0      0       0  0.159686  0.345176  \n",
       "1      0      0       0 -0.101524  0.084104  \n",
       "2      0      0       0  1.324247  1.266776  \n",
       "3      0      0       0  1.182758  1.170822  \n",
       "4      0      0       0  1.487503  1.373242  \n",
       "\n",
       "[5 rows x 23 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np  # 矩阵操作\n",
    "import pandas as pd # SQL数据处理\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.metrics import r2_score  #评价回归预测模型的性能\n",
    "# path to where the data lies\n",
    "#dpath = './data/'\n",
    "df = pd.read_csv(\"D:\\python\\计算机视觉\\线性回归\\BostonHousePrice_CodeData\\FE_boston_housing.csv\")\n",
    "\n",
    "#通过观察前5行，了解数据每列（特征）的概况\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 从原始数据中分离输入特征x和输出y\n",
    "# 这里我们y有2个取值，原始的MEDV及其log1p之后的值\n",
    "col_y = [\"MEDV\",\"log_MEDV\"]\n",
    "y = pd.DataFrame(df,columns = col_y)\n",
    "\n",
    "X = df.drop([\"MEDV\", \"log_MEDV\"], axis = 1)\n",
    "\n",
    "#特征名称，用于后续显示权重系数对应的特征\n",
    "feat_names = X.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 直方图 （seaborn中连续用distplot，选离散型用countplot，pandas也可以直接读）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0.5, 0, 'Median value of owner-occupied homes')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "<Figure size 432x288 with 0 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEZCAYAAABsPmXUAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAdRklEQVR4nO3dfZxcVZ3n8c9XwkOgMQ9GezHJkKiIAlEHehB1RzsiY3iQ6KysMIiJMpuXI7roRAVHZpgdZWUcGXTWHZ24sAFkaSKiIIgSkIZhNGDiAgECEiCQB0hkINGOGZzIb/64p+Wmqe6qrqqu6pz6vl+vfnXde86953eqbv3q1rm37lVEYGZmeXlRuwMwM7Pmc3I3M8uQk7uZWYac3M3MMuTkbmaWISd3M7MMObmbmWXIyd2yJmmdpN9ImjZk/l2SQtIsSUtTnYHS392p3qxUb3D+ZknXSTqmtK4fSvqbCm3Pl/SkpAlj31OzXTm5Wyd4FDhlcELSHGDikDpfjIiu0t/rh5RPjogu4PXAcuA7khamsqXAaZI0ZJnTgMsjYmeT+mFWMyd36wSXAR8oTS8ALq1nRRHxZER8Bfhr4G8lvQj4LjAV+MPBepKmACfU245Zo5zcrROsAF4s6bWS9gDeB3yzwXVeDbwMODgidgDL2PUD5L8CD0TE3Q22Y1YXJ3frFIN778cADwAbh5R/UtLW0t8lVda3Kf2fmv5fApwkaXC45wNpnllb+ECPdYrLgNuA2VQeKvlSRJwzivVNT/+fBoiI2yX9Apgv6U7gD4A/biBes4Y4uVtHiIjHJD0KHAec3oRVvgfYAjxYmncpxR77wcCNEbG5Ce2Y1cXJ3TrJ6cCUiNhe7+mJkrqBk4BzgTMj4rlS8aXAOcDrgE80GqxZI5zcrWNExMMjFH9a0sdL0/8WEeVz47emUx23AyuBkyLiB0PWv07SjylOl7y2WXGb1UO+WYeZWX58toyZWYac3M3MMuTkbmaWoarJXdLFkrZIunfI/I9JelDSfZK+WJr/GUlrU9k7xyJoMzMbWS1nyywFvkrphx+S5gLzgddFxLOSXpbmHwKcDBwKvBy4SdKrI+K3IzUwbdq0mDVrVtVAtm/fzn777VdDyHlxvzuL+91ZGun3qlWrnoqIl1Yqq5rcI+I2SbOGzP4z4PyIeDbV2ZLmzwf60vxHJa0FjgR+MlIbs2bNYuXKldVCob+/n97e3qr1cuN+dxb3u7M00m9Jjw1XVu+Y+6uBP5R0h6RbJf1Bmj8dWF+qt4Hnf6ZtZmYtUu+PmCYAU4CjKK6hsUzSK4Ch17MGqHgivaRFwCKA7u5u+vv7qzY6MDBQU73cuN+dxf3uLGPV73qT+wbg6ih+AXWnpOeAaWn+zFK9GTx/9bxdRMQSYAlAT09P1PK1xF/bOov73Vnc7+aqd1jmu8DbASS9GtgLeIriJ9cnS9pb0mzgIODOZgRqZma1q7rnLukKoBeYJmkDxQWTLgYuTqdH/gZYkPbi75O0DLgf2AmcUe1MGTMza75azpY5ZZii9w9T/zzgvEaCMjOzxvgXqmZmGXJyNzPLkJO7mVmGfLMOG5VZZ1/fsrYWz9nJwtTeuvOPb1m7ZjnwnruZWYac3M3MMuTkbmaWISd3M7MMObmbmWXIyd3MLENO7mZmGXJyNzPLkJO7mVmGnNzNzDLk5G5mliEndzOzDDm5m5llyMndzCxDVZO7pIslbUn3Sx1a9klJIWlampakf5C0VtI9kg4fi6DNzGxktey5LwXmDZ0paSZwDPB4afaxwEHpbxHwtcZDNDOz0aqa3CPiNuDpCkUXAp8GojRvPnBpFFYAkyUd0JRIzcysZnWNuUs6EdgYEXcPKZoOrC9Nb0jzzMyshUZ9mz1J+wKfBf6oUnGFeVFhHpIWUQzd0N3dTX9/f9W2BwYGaqqXm/HU78Vzdrasre6Jz7c3XvrfCuPp9W4l97u56rmH6iuB2cDdkgBmAD+TdCTFnvrMUt0ZwKZKK4mIJcASgJ6enujt7a3acH9/P7XUy8146vfCFt9D9YLVxSa67tTelrXbbuPp9W4l97u5Rj0sExGrI+JlETErImZRJPTDI+JJ4FrgA+msmaOAbRHxRHNDNjOzamo5FfIK4CfAwZI2SDp9hOrfBx4B1gLfAD7SlCjNzGxUqg7LRMQpVcpnlR4HcEbjYZmZWSP8C1Uzsww5uZuZZcjJ3cwsQ07uZmYZcnI3M8uQk7uZWYac3M3MMuTkbmaWISd3M7MMObmbmWXIyd3MLENO7mZmGXJyNzPLkJO7mVmGnNzNzDLk5G5mliEndzOzDDm5m5llqJZ7qF4saYuke0vz/k7SA5LukfQdSZNLZZ+RtFbSg5LeOVaBm5nZ8GrZc18KzBsybzlwWES8Dvg58BkASYcAJwOHpmX+UdIeTYvWzMxqUjW5R8RtwNND5t0YETvT5ApgRno8H+iLiGcj4lFgLXBkE+M1M7MaKCKqV5JmAddFxGEVyr4HXBkR35T0VWBFRHwzlV0E3BARV1VYbhGwCKC7u/uIvr6+qnEMDAzQ1dVVtV5uxlO/V2/c1rK2uifC5h3F4znTJ7Ws3XYbT693K7nfozd37txVEdFTqWxCI0FJ+iywE7h8cFaFahU/PSJiCbAEoKenJ3p7e6u219/fTy31cjOe+r3w7Otb1tbiOTu5YHWxia47tbdl7bbbeHq9W8n9bq66k7ukBcAJwNHx/O7/BmBmqdoMYFP94ZmZWT3qOhVS0jzgLODEiPh1qeha4GRJe0uaDRwE3Nl4mGZmNhpV99wlXQH0AtMkbQDOpTg7Zm9guSQoxtk/HBH3SVoG3E8xXHNGRPx2rII3M7PKqib3iDilwuyLRqh/HnBeI0GZmVlj/AtVM7MMObmbmWXIyd3MLENO7mZmGXJyNzPLkJO7mVmGnNzNzDLk5G5mliEndzOzDDm5m5llyMndzCxDTu5mZhlycjczy5CTu5lZhpzczcwy5ORuZpYhJ3czsww5uZuZZahqcpd0saQtku4tzZsqabmkh9L/KWm+JP2DpLWS7pF0+FgGb2ZmldWy574UmDdk3tnAzRFxEHBzmgY4Fjgo/S0CvtacMM3MbDSqJveIuA14esjs+cAl6fElwLtL8y+NwgpgsqQDmhWsmZnVRhFRvZI0C7guIg5L01sjYnKp/JmImCLpOuD8iLg9zb8ZOCsiVlZY5yKKvXu6u7uP6OvrqxrHwMAAXV1dtfQrK+Op36s3bmtZW90TYfOO4vGc6ZNa1m67jafXu5Xc79GbO3fuqojoqVQ2oaGoXkgV5lX89IiIJcASgJ6enujt7a268v7+fmqpl5vx1O+FZ1/fsrYWz9nJBauLTXTdqb0ta7fdxtPr3Urud3PVe7bM5sHhlvR/S5q/AZhZqjcD2FR/eGZmVo96k/u1wIL0eAFwTWn+B9JZM0cB2yLiiQZjNDOzUao6LCPpCqAXmCZpA3AucD6wTNLpwOPASan694HjgLXAr4EPjkHMZmZWRdXkHhGnDFN0dIW6AZzRaFBmZtYY/0LVzCxDTu5mZhlycjczy5CTu5lZhpzczcwy5ORuZpYhJ3czsww5uZuZZcjJ3cwsQ07uZmYZcnI3M8uQk7uZWYac3M3MMuTkbmaWISd3M7MMObmbmWWo2TfINhsTs1p4Y+6h1p1/fNvaNquX99zNzDLUUHKX9AlJ90m6V9IVkvaRNFvSHZIeknSlpL2aFayZmdWm7mEZSdOB/w4cEhE7JC0DTqa4QfaFEdEn6evA6cDXmhKtAe0dojCz3UOjwzITgImSJgD7Ak8AbweuSuWXAO9usA0zMxslRUT9C0tnAucBO4AbgTOBFRHxqlQ+E7ghIg6rsOwiYBFAd3f3EX19fVXbGxgYoKurq+54d1dD+71647Y2RtM63RNh8452RwFzpk9qaXvezjtLI/2eO3fuqojoqVTWyLDMFGA+MBvYCnwLOLZC1YqfHhGxBFgC0NPTE729vVXb7O/vp5Z6uRna74UdMiyzeM5OLljd/hO61p3a29L2vJ13lrHqdyPDMu8AHo2IX0TEvwNXA28GJqdhGoAZwKYGYzQzs1FqJLk/DhwlaV9JAo4G7gduAd6b6iwArmksRDMzG626k3tE3EFx4PRnwOq0riXAWcCfS1oLvAS4qAlxmpnZKDQ0oBkR5wLnDpn9CHBkI+s1M7PG+BeqZmYZcnI3M8uQk7uZWYac3M3MMuTkbmaWISd3M7MMObmbmWXIyd3MLENO7mZmGXJyNzPLkJO7mVmGnNzNzDLk5G5mliEndzOzDDm5m5llyMndzCxDTu5mZhlycjczy1BDyV3SZElXSXpA0hpJb5I0VdJySQ+l/1OaFayZmdWm0T33rwA/iIjXAK8H1gBnAzdHxEHAzWnazMxaqO7kLunFwFuBiwAi4jcRsRWYD1ySql0CvLvRIM3MbHQUEfUtKL0BWALcT7HXvgo4E9gYEZNL9Z6JiBcMzUhaBCwC6O7uPqKvr69qmwMDA3R1ddUV7+5saL9Xb9zWxmhap3sibN7R7ihgzvRJLW3P23lnaaTfc+fOXRURPZXKGknuPcAK4C0RcYekrwC/BD5WS3Iv6+npiZUrV1Zts7+/n97e3rri3Z0N7fess69vXzAttHjOTi5YPaHdYbDu/ONb2p63887SSL8lDZvcGxlz3wBsiIg70vRVwOHAZkkHpIYPALY00IaZmdWh7uQeEU8C6yUdnGYdTTFEcy2wIM1bAFzTUIRmZjZqjX7n/RhwuaS9gEeAD1J8YCyTdDrwOHBSg22YmdkoNZTcI+IuoNJ4z9GNrNfMzBrjX6iamWWo/aci7MZaddbK4jk7WdghZ8iYWXN4z93MLENO7mZmGXJyNzPLkJO7mVmGnNzNzDLk5G5mliEndzOzDDm5m5llyMndzCxDTu5mZhlycjczy5CTu5lZhpzczcwy5ORuZpYhJ3czswz5eu5mVbTquv2DBq/fv+7841varuWl4T13SXtI+v+SrkvTsyXdIekhSVem+6uamVkLNWNY5kxgTWn6b4ELI+Ig4Bng9Ca0YWZmo9BQcpc0Azge+D9pWsDbgatSlUuAdzfShpmZjZ4iov6FpauALwD7A58EFgIrIuJVqXwmcENEHFZh2UXAIoDu7u4j+vr6qrY3MDBAV1dX3fE22+qN21rSTvdE2LyjJU2NK53e7znTJ7U7lJYab+/vVmmk33Pnzl0VET2Vyuo+oCrpBGBLRKyS1Ds4u0LVip8eEbEEWALQ09MTvb29lartor+/n1rqtUqrblq9eM5OLljdece+O73f607tbXcoLTXe3t+tMlb9buSd8xbgREnHAfsALwa+DEyWNCEidgIzgE2Nh2lmZqNR95h7RHwmImZExCzgZOBHEXEqcAvw3lRtAXBNw1GamdmojMWPmM4C/lzSWuAlwEVj0IaZmY2gKQOaEdEP9KfHjwBHNmO9ZmZWH19+wMwsQ07uZmYZcnI3M8uQk7uZWYac3M3MMuTkbmaWISd3M7MMObmbmWXIyd3MLENO7mZmGeq866ma7SZafe/WQb53ax68525mliEndzOzDDm5m5llyMndzCxDTu5mZhlycjczy5CTu5lZhupO7pJmSrpF0hpJ90k6M82fKmm5pIfS/ynNC9fMzGrRyJ77TmBxRLwWOAo4Q9IhwNnAzRFxEHBzmjYzsxaq+xeqEfEE8ER6/CtJa4DpwHygN1W7hOLG2Wc1FOUI2vUrPjPLRzvzyNJ5+43JehURja9EmgXcBhwGPB4Rk0tlz0TEC4ZmJC0CFgF0d3cf0dfXV7WdgYEBurq6dpm3euO2RkLfLXRPhM072h1F67nf7TFn+qS2tFvp/d0q7cwjsyftUXe/586duyoieiqVNZzcJXUBtwLnRcTVkrbWktzLenp6YuXKlVXb6u/vp7e3d5d5nbDnvnjOTi5Y3XmXAXK/26Nd15ap9P5ulXbvudfbb0nDJveGzpaRtCfwbeDyiLg6zd4s6YBUfgCwpZE2zMxs9Bo5W0bARcCaiPj7UtG1wIL0eAFwTf3hmZlZPRr57vcW4DRgtaS70ry/AM4Hlkk6HXgcOKmxEM3MbLQaOVvmdkDDFB9d73rNzKxx/oWqmVmGnNzNzDLk5G5mliEndzOzDDm5m5llyMndzCxDTu5mZhnqvAt3mNm41QnXimoV77mbmWXIyd3MLEMeljGzXbRraGTxnJ04JTWP99zNzDLk5G5mliEndzOzDDm5m5llyMndzCxDTu5mZhlycjczy9CYJXdJ8yQ9KGmtpLPHqh0zM3uhMUnukvYA/jdwLHAIcIqkQ8aiLTMze6Gx2nM/ElgbEY9ExG+APmD+GLVlZmZDKCKav1LpvcC8iPjTNH0a8MaI+GipziJgUZo8GHiwhlVPA55qcri7A/e7s7jfnaWRfh8YES+tVDBWF3JQhXm7fIpExBJgyahWKq2MiJ5GAtsdud+dxf3uLGPV77EaltkAzCxNzwA2jVFbZmY2xFgl958CB0maLWkv4GTg2jFqy8zMhhiTYZmI2Cnpo8APgT2AiyPiviaselTDOBlxvzuL+91ZxqTfY3JA1czM2su/UDUzy5CTu5lZhnar5C7p7yQ9IOkeSd+RNLndMbWKpJMk3SfpOUnZny7WiZevkHSxpC2S7m13LK0kaaakWyStSdv4me2OqRUk7SPpTkl3p37/j2auf7dK7sBy4LCIeB3wc+AzbY6nle4F/hi4rd2BjLUOvnzFUmBeu4Nog53A4oh4LXAUcEaHvN7PAm+PiNcDbwDmSTqqWSvfrZJ7RNwYETvT5AqK8+c7QkSsiYhafsWbg468fEVE3AY83e44Wi0inoiIn6XHvwLWANPbG9XYi8JAmtwz/TXtDJfdKrkP8SHghnYHYWNiOrC+NL2BDnizG0iaBfw+cEd7I2kNSXtIugvYAiyPiKb1e6wuP1A3STcB/6lC0Wcj4ppU57MUX+Uub2VsY62WvneIqpevsPxI6gK+DXw8In7Z7nhaISJ+C7whHT/8jqTDIqIpx1zGXXKPiHeMVC5pAXACcHRkdpJ+tb53EF++osNI2pMisV8eEVe3O55Wi4itkvopjrk0JbnvVsMykuYBZwEnRsSv2x2PjRlfvqKDSBJwEbAmIv6+3fG0iqSXDp7xJ2ki8A7ggWatf7dK7sBXgf2B5ZLukvT1dgfUKpLeI2kD8Cbgekk/bHdMYyUdNB+8fMUaYFmTLl8xrkm6AvgJcLCkDZJOb3dMLfIW4DTg7el9fZek49odVAscANwi6R6KHZrlEXFds1buyw+YmWVod9tzNzOzGji5m5llyMndzCxDTu5mZhlycjczy5CT+zgnKSS9Kj3+uqS/bHdMZZKWSvp8G9p9j6T1kgYk/X6r289Rei5fUeeyv9tOK5T1S/rTxqKz0Rp3v1DdXUlaB7wceHlEPFWafxfwemB2RKxrpI2I+HAjy2fmS8BHO+yyDGMqIrraHYM1j/fcm+tR4JTBCUlzgIntCydrBwK7xQ+bJHknylrOyb25LgM+UJpeAFxariBpb0lfkvS4pM1pqGViqfxTkp6QtEnSh4Ys+7shEElTJF0n6ReSnkmPZ5Tq9kv6nKR/kfQrSTdKmlYp6HSThBNK0xMkPSXp8DT9LUlPStom6TZJhw6znoWSbh8yrzysNGLfhyz3IknnSHos3cDiUkmT0joGKG68frekh4dZ/s2Sfppi/qmkN6f5cyWtLtW7SdKdpenbJb07PV4n6ZMqbg6zTdKVkvYp1T0h/Zpyq6QfS3pdqWydpLPSrw+3V0rwqS9fTq/1pvR471L5/LT+X0p6WMXlN5A0VdL/Tcs8I+m7NT7/S9NzvjxtE7dKOrCe12qk7XQYBw63LUo6UcXNKram7fa1Q57HT6XXYLukiyR1S7ohresmSVNK9Y9Kr8VWFTfB6C2VLZT0SFruUUmn1hD37isi/NeEP2AdxbUhHgReS5F81lPsYQYwK9X7MsV1UqZSXErhe8AXUtk8YDNwGLAf8P/Ssq9K5UuBz6fHLwH+C7BvWs+3gO+W4ukHHgZeTfHtoR84f5jY/4rigk2D08cDD5SmP5Ta2DvFf1eprBzTQuD2Iesuxz9s3yvE9CFgLfAKoAu4Gris0norLDsVeIbiJ+0TKL5NPZOes32AHcC0VPYkxUXJ9k/P0w7gJaXX9E6K4bapFJdC+HAqO5ziMq1vTK/1glR/79Kyd1FcAG3iMHH+DcV9CV4GvBT4MfC5VHYksA04hmInbDrwmlR2PXAlMIXiGuBvq/H5Xwr8Cnhrei2/Uq5f62tFle20Qj/7GWZbTPO2p37uCXw6ve57lZ7HFUB3eg62AD+juCzw3sCPgHNT3enAvwLHpefsmDT90hTnL4GDU90DgEPbnTfGNCe1O4Bc/ng+uZ8DfCG9AZZTJJAAZlFcynY78MrScm8CHk2PL6aUgNOGXzG5V2j/DcAzpel+4JzS9EeAHwyz7KvSm37fNH058FfD1J2cYpo0NCZGSC7V+l6hnZuBj5SmDwb+HZhQXu8wy54G3Dlk3k+AhenxP1Pc1eoo4EZgWXq95gL3DHlN31+a/iLw9fT4a6REXCp/kOcT7TrgQ1W2mYeB40rT7wTWpcf/BFxYYZkDgOeAKRXKhn3+S69VX6msC/gtMHM0r1W17bRCXMNui8BfUlw7aLDsRcBGoLf0PJ5aKv828LXS9MdIOzUUFxW8bEjbP6T44N0P2EqxQ1Txwza3P48FNt9lFLfCm82QIRmKPYh9gVXS7y5ZLoo9Pyj2EFeV6j82XCOS9gUupEhKg19L95e0RxTXiIZir3TQrynezC8QEWslrQHeJel7wIkUe0aDt7w7Dzgpxf9cWmwaxZ5lrar1faiXs2v/H6P4oOymePOPZOiyg8sP3vDjVqCX4tLCt1Ls1b+N4rZntw5Zbuhz+PL0+EBggaSPlcr3KpVD6YYjaQjgn9LkP0fEsRXifKy0/Ezg+xX6NhN4OiKeqVBWi9/FFBEDkp5ObZZvjtK07bRkuG1xl+cgIp6TtJ5db86yufR4R4XpwXUdCJwk6V2l8j2BWyJiu6T3AZ8ELpL0LxS39mvaVRjHG4+5N1lEPEZxYPU4iqGEsqcoNsZDI2Jy+psUz5+l8AS7Xsf890ZoajHF3uwbI+LFFF+1ofKNLmpxBcXwxXzg/ohYm+b/SZr3DmASxTeQ4drZTpEUigpS+cYj1fo+1CaKN+ug36O4QcvmytVHXHZw+cEPhcHk/tb0+FaK5P42Xpjch7MeOK/Ul8kRsW9EXFGq87ur8kXE5RHRlf6OHSbO3+P569avB145TLtTVfnm8CM9/4Nmlsq7KIZdhl4rv5nbaTW7PAcqPk1mUv0DvJL1FHvu5ddkv4g4HyAifhgRx1B8+3kA+EYDcY97Tu5j43SKG99uL8+MiOcoNqgLJb0MQNJ0Se9MVZYBCyUdkvbMzx2hjf0p3oBbJU2tUrcWfcAfAX9GMYZabudZirHLfYH/OcI67gYOlfSGdODxrwcLauj7UFcAn1BxTfeu1O6V8fw9dEfyfeDVkv5ExcHh91HcaHvwcqo/pvhgPJJi+OY+igTzRmq/Afk3gA9LeqMK+0k6XtL+NS4PRR/PUXFd72kUxz6+mcouAj4o6WgVB5enS3pNRDxBcXvJf1RxUH1PSYMf7MM+/yXHSfrPKq6T/zngjogo77U3ezutZhlwfOrnnhQ7Lc9SvEaj9U2Kb5/vVHH7un0k9UqakQ7Cnihpv7T+AYohqWw5uY+BiHg4IlYOU3wWxQGjFZJ+CdxEkWiIiBsoDmT9KNX50QjNfJni4NRTFAecftBgzE9QjEu/meJg3aBLKb42bwTuT20Nt46fUxwkvAl4CLh9SJVh+17BxTw/xPUo8G8U46u19OVfKe7WtZjiQ+nTwAmRfn+QPnR/BtwXxQ24oej7YxGxpcY2VgL/jeIeA8+kfi2sZdmSzwMrgXuA1Smmz6f13wl8kGLobRvFN4rBPdzTKI4/PEBxgPHjaZlqzz8UH9znUtyI+whguDNGmrWdjiiKm76/H/hfFNvyu4B3lV6X0axrPcW3zL8AfkGxJ/8pijz3IortYRNF399GMfafLV/P3axDSFoKbIiIc9odi40977mbmWXIyd3MLEMeljEzy5D33M3MMuTkbmaWISd3M7MMObmbmWXIyd3MLEP/AXB7RSnhoUHrAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig=plt.figure()\n",
    "# kde-核密度估计（折线图），bins\n",
    "df.hist(\"MEDV\")  #也可以df[\"MEDV\"].hist()\n",
    "plt.xlabel(\"Median value of owner-occupied homes\", fontsize=12)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 箱形图\n",
    "看离散程度和离散点，取的是1/4分位数和3/4分位数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x25f0268b088>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAD4CAYAAADxeG0DAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3deXhU1cE/8O+ZPTPZJwmBLISQsIqi4l6UKqLggtWqL1QrVkSt4t7ytiLyq1haX8EqiBRbF1rXGmlZhURFdiQgshMChIQESCAEkkxmP78/gFQg+9yZO3fy/TwPz8NNbu75vnj7fW/unHuukFKCiIi0S6d2ACIiCgyLnIhI41jkREQaxyInItI4FjkRkcYZ1Bg0KSlJZmVlqTE0dQIbN248KqVMVmNsntsUTM2d26oUeVZWFgoLC9UYmjoBIcQBtcbmuU3B1Ny5zVsrREQaxyInItI4FjkRkcaxyImINI5FTkSkcSxyIiKNY5ETEWkci5yIVOH3++H3+9WOERFY5EQUcnv27MFNN92E66+/Hv/4xz/UjqN5LHIiCrnNmzfD4/HAb4rGd999p3YczWORE1HIFRcXQ5is8Malo3jvXt5iCRCLnIhCbuu2bXBbk+CzJaHB4UBpaanakTSNRU5EIVVdXY2K8nL4o1Pgi+kCAPjhhx9UTqVtLHIiCqkz98S9sWmQ5ljAzPvkgWKRE1FIrV27FsJkg9+aCAgBd2w6Cgs3wuVyqR1Ns1jkRBQyDQ0NWLt2HVxxGYAQAABvQne4XE5elQeARU5EIbNmzRq43S54E3s0fs0X2xXCaMFXX32lYjJtY5ETUcgsXboUMEc3fsgJABA6uBJ6YNWq1aitrVUvnIaxyIkoJI4dO4YNGzbAlZgNiLOrx2PPgdfrwfLly9UJp3EsciIKiWXLlkFKCU9S7nnf89uSIK0JWLxkiQrJtI9FTkRBJ6XE4iVL4I9OgbTEnb+DEHAl5mDnjh18OKgDWOREFHTFxcUoKy2F257T7D5ee08A4IeeHcAiJ6Kg+/rrrwGhgycxq9l9pMkKX2xXLMsvgJQydOEiAIs8zI0ZMwZDhgzB2LFj1Y5C1GErV66CLyYVMFha3M+TkIVDFeUoKysLUbLIEHCRCyEsQojvhBA/CCG2CyH+nxLB6JSSkhIAp341JdKiQ4cO4eDBMnjiM1rd1xt3ap/169cHO1ZEUeKK3AXgeinlRQAGArhZCHGlAsft9MaMGXPWNq/KSYu2bdsGAPDFdG11X2mOBiyxjT9DbWMI9ADy1M2sutObxtN/eINLAWeuxs/gVTlp0a5duyD0Bvij4tu0v8dqx/YdO4KcKrIoco9cCKEXQmwGUAkgX0p53u9FQohxQohCIURhVVWVEsMShQWe2y2rqKiA3xJ73kNAzfFb4nDs6FF4PJ4gJ4scihS5lNInpRwIIB3A5UKIC5rYZ46UcpCUclBycrISwxKFBZ7bLTtSWQmfwdbm/aUpGlJKHD16NIipIouis1aklDUAlgO4Wcnjdlbp6elnbWdlZakThCgADU4npL7td3Gl7tS+brc7WJEijhKzVpKFEPGn/x4FYCiAXYEel4D+/fuftd27d2+VkhB1nN/nByDa/gOnl7f1+XzBCRSBlLgi7wrgGyHEFgAbcOoe+UIFjtvpFRQUtLhNpAVWaxSEvx33u32n9o2KigpSosijxKyVLQAuViALnePcKxJeoZAWxcfFQVSWt3l/ndcJAIiLa2JNFmoSn+wMY3q9vsVtIi1ITU2FwVPf5v2Fqw5WWzSsVmsQU0UWFnkYGzp0aIvbRFqQlpYG6apvvGXSGp3rJNLSugU5VWRhkYexcePGQZz+4EcIgXHjxqmciKj9unfvDgDQNRxv0/5G1wlk9+jR+o7UiEUexux2e+MUxPT0dNjtdpUTEbVfz56nlqfVO6pb3Vd4nJCuemRnZwc7VkRhkYexY8eO4dChQwCAw4cP49ixYyonImq/1NRUWKKs0DW0XuQ6x6lzPDf3/LcIUfNY5GHsgw8+aJyp4vV6MXfuXJUTEbWfTqdDbk5PGNpwRa47vc+Zq3hqGxZ5GMvPz29cYF9KiWXLlqmciKhjcnJyoHfWAK28MELfUI2ERDunHrYTizyMdenSpcVtIq3Izs6G9Loh3HUt7mdw1iA3h1fj7cUiD2OHDx9ucZtIKxpnrjhPNL+TlNA5TzbuS23HIg9jqampLW4TaUVaWhoAQOc82ew+wuOA9HnOWyyOWsciD2NHjhxpcZtIKxITE2E0mqBzNX9r5cz3unZt/U1CdDYWeRi79tprW9wm0gohBJKSkyDczT+qf+b+eUpKSqhiRQwWeRiTrXzCT6QlyUlJ0Hkbmv2+8Jz6Hh98az8WeRhbuXLlWdsrVqxQKQlR4BISEqD3OZv9vvA4odPrER0dHcJUkYFFHsaSkpJa3CbSkpiYGAhf82/9ET43bLboxvWFqO1Y5GGsvLy8xW0iLYmOjob0uJr9vvC5YLO1/d2e9F8s8jDm9/tb3CbSkqioKMDvA2TT57HweWHjGuQdwiIPY3yxBEWSxle3+b1N7+D3whJlCV2gCMIiD2PnTsPiI/qkZSaTCQAg/E2/slBIHyxmcygjRQwWeRg79wEgPqJPWnamyNFMkeukD0ajMYSJIgeLPIydO4+c88pJyxpvDTZ3jxwSBkPA74PvlFjkYezcaViclkVaptOdqZtmLkik/NE+1B78VwtjgwcPPmubj+gTUVMCLnIhRIYQ4hshxE4hxHYhxFNKBCNegVNk+e/02ebOa8Epth2kxA0pL4DnpJSbhBAxADYKIfKllDsUOHantmrVqha3ibTE7T79VKeu6drx6/TweDwhTBQ5Ar4il1IeklJuOv33WgA7AaQFelzih50UWZzOU+usSF3Tz0P4hR4OhyOUkSKGovfIhRBZAC4GsL6J740TQhQKIQqrqqqUHDZi3XDDDWdtDx06VKUk1BKe221z4sQJQAhAb2ry+9JgRs3J5l88Qc1TrMiFENEA8gA8LaU877+GlHKOlHKQlHJQcnKyUsNGtEceeeSs7XHjxqmUhFrCc7ttampqIIxRp8q8CdIQhePVx0OcKjIoUuRCCCNOlfiHUsovlDgmnXLmA09+8ElaV1FRAZ+p+UWxpNmGutqTvL3SAUrMWhEA/g5gp5RyeuCR6IwPPvig8SEKvV6PuXPnqpyIqONKy8rgM8U0+32/ORYAV/nsCCWuyK8BcD+A64UQm0//GaHAcTu9goICeL2nFhjyer3Iz89XORFRx5w4cQJHq6rgtyY2u4/v9Pf27NkTqlgRQ4lZK6uklEJKeaGUcuDpP4uVCNfZDR069KxbKzfeeKPKiYg6Zvfu3QAAn7X517hJcyyEwYSdO3eGKlbE4JOdYez2229vnHIopcRtt92mciKijtm0aROg08EX3cKLlYWAx9YFhRs3hS5YhGCRh7H58+eftb1gwQKVkhAFZt367+CzpQD6llc39MZ2w6GKchw6dChEySIDizyMFRQUnLXNe+SkRQcPHkTJ/n3wxGe2uq83PgMAXzTeXizyMPaTn/zkrO1zF9Ei0oLly5cDALwJWa3uKy2xkLYkFHz1VXBDRRgWeRhrXJviNJer+RfXEoUjKSUWLV4MX0wqpDm6TT/jSszGnqIi7N+/P8jpIgeLPIyd++slf90krdmyZQsOVVTAnZTb5p/x2nsCOh0WLVoUxGSRhUUexs5d0pNLfJLWzJs3D8JghjehR5t/Rhqj4InvjsVLlqChoSGI6SIHi5yIgqKyshIrVqyEy54D6Nu3YrYnpS8c9fVYtmxZkNJFFhY5EQXFvHnz4Jd+uLv0a/fP+qK7wG9Lwmf/+hd/E20Dvum0jWbMmIHi4uKQjmkymc76wNNkMuGpp0L3AqacnByMHz8+ZONR5HA4HPjP/PnwxHeHNDe9voq5dB0AwJV55fnfFAKuLv1Rvu9brF+/HldddVUw42oer8jDWPfu3c/azsrKUicIUTstXrwYjvp6uFMvaHYfnaMaOkd1s9/3JvQAzNH4+ONPghExovCKvI3UujIdNmwY3G430tPTMWfOHFUyELWH1+vFp599Bn9MF/hbeiS/NTodnCn9sGXLd9i1axf69OmjXMgIwyvyMNe9e3fodDpMnjxZ7ShEbbJixQpUVVbC2WVAwMfyJPeCMJjw6aefKpAscrHIw5zVasWAAQOQk5OjdhSiVkkp8fEnnwBRcfCdftw+IHoTXEm9sXz5cq6/0gIWOREpZuvWrdhTVARnSr9mX+nWXu6UvpA4NQuGmsYiJyLFfP55HoTRDI9dud8gpTkanoQsLFiwkK+BawaLnIgUUVlZiVWrVsJp79XqcrXt5e7SHw0NDq4A2gwWOREpYsGCBfBLCU+K8rNL/LZk+G1J+Dzvi8aXrdB/sciJKGBerxcLFiyENy692QeAAiIEXMl9UFZ6AFu3blX++BrHIieigK1ZswY1NcfhTu4dtDG8iT0gDGa+KasJLHIiCtjChQsBsw2+uPTgDaI3wpXQA8uXL0dtbW3wxtEgFjkRBaSqqgobNmyAKzEHEMGtFE9yL3g8Hnz99ddBHUdrWOREFJD8/HxIKeFpx8sjOspvtUNaE7Hkyy+DPpaWsMiJqMOklFjy5ZfwR6dAWmKDP6AQcCX2xK6dO1FWVhb88TRCkSIXQrwrhKgUQmxT4nhEpA3FxcUoKy2FW8EHgFrjtfcEhEBBQUHIxgx3Sl2Rvw/gZoWORUQakZ+fD+h08CS2/VVugZImK3wxXfHl0qWcU36aIkUupVwBoPmFhYko4vh8PizLL4AnNh0wmEM6ttveE0cOH8aOHTtCOm64Ctk9ciHEOCFEoRCisKqqKlTDEgVdZz23v//+e9Qcrz51qyPEvAndIXQGPrJ/WsiKXEo5R0o5SEo5KDk5OVTDEgVdZz23ly1bBmEww6vEcrXtpTfBHZ+B/IKv4PF4Qj9+mOGsFSJqN4fDgW+/XQFXfHdAp86Lxjz2HNTX1WLt2rWqjB9OWORE1G5fffUVXC4nPMm9VMvgi0sDzDYsWLBQtQzhQqnphx8DWAugtxDioBDiISWOS0ThR0qJf//7P5DWRPhtKt5KEjq47LnYULgBFRUV6uUIA0rNWhklpewqpTRKKdOllH9X4rhEFH42b96MvXuL4Urpq9hbgDrKk9wbEAJ5eXmq5lAbb60QUbv845//hDBGwaPCbJVzSZMNnoRsLFiwENXVnXcGNIuciNqssLAQmzZuREPqANU+5DyXq9tFcHvcmDt3rtpRVMMiJ6I28Xg8mDFzJmCJgSelr9pxGklLHNxJvTF//gIUFxerHUcVLHIiapO5c+fiQEkJHBlXADq92nHO4kq/BNJgxh+nTu2U88pZ5ETUqsLCQnz44YfwJOXCF5+pdpzzGSxwZF6FfXv3YtasWWqnCTkWORG1qKSkBC9OmgSfJR7OzCvVjtMsb0J3uLv0x7x58zBv3jy144QUi5yImnXgwAE8+9xzcHqB+pyhgN6odqQWuTIugzc+E2+++SaWLFmidpyQYZETUZOKiorwxBPjcby2AXW5wyDN0WpHap3QoSH7OnhjuuLPf/4zPv/8c7UThQSLnIjOU1BQgPHjn0St24/a3sPhtyaqHant9EY4cm+EN6E7Zs6cienTp8PlcqmdKqhY5ETUyOVyYfr06ZgyZQoaTHGo63MrpCVO7Vjtp9OjoedP4U4dgPnz5+Pxx5+I6Mf4WeREBADYsmULHhr7MObPnw9X6gDU9xoOabKqHavjhA6ujMvgyBmKvSWlePBXv0JeXh58Pp/ayRTHIifq5Orq6jB9+nQ8+eSTOFh1HI5ew+DOuAzQRUY9+BIyUdtvJBxmO2bMmIFf//px7N27V+1YigqPZ2yJKOS8Xi/mz5+P995/H7W1tXB36Q9X2iVhPzOlI6Q5Go7cYTBU70PR/u/w8MMP45ZbbsGYMWNgt9vVjhcwFjlRJyOlxKpVqzDr7dk4VFEOX2xXOPsOgd+WpHa04BICXntPnIxLg7l8MxYsXIRly/IxevQo3HPPPYiKilI7YYexyIk6CSkl1q1bh3ffew97ioogo+LRkHsjfHHpqi9HG1IGC1zdr4S7S194Dm7Ee++9h7y8LzB69CiMHDlSk4XOIieKcFJKrF27Fu+9/z72FBUBllg0ZP0E3qQcQETGffCOkJY4OHOuh7uuEr6K7zF79mx89NHHmix0FjlRhPL5fPj222/xj3/+E/v37ftvgdtzVP8g01y6DnrHMQBA1K7F8FsT4VLp8X9/dAocvW6CrvYIfIc2Y/bs2fjnhx/inrvvxs9+9jPExMSokqs9NFXkM2bM6HTLVJ75v/epp55SOUlo5eTkYPz48WrH0CS3241ly5bhw48+wqGKCiAqHg09BsOb2FP1Aj9D56iG8J1apdBQexhelfMAgD+mCxwxN0FXVwnvoR/w7rvv4sOPPsIdI0finnvuCesPRTVV5MXFxdi8bSd8WnrKLEA6twQAbNx3ROUkoaN3dN43vQTC4XBgwYIF+OTTz3C8+hj8tmS4el4Pb0L3znUPPED+6BQ05N4InaMankNb8Olnn+HzvDyMGD4c9957L9LT09WOeB5NFTkA+KyJaOgzQu0YFERRuxarHUFTampq8MUXXyAv7wvU19fBF9sNrl43wRfbjQUeAL81Ec6eQ+ByXgLT4W1YuGgxFi5ciCFDhmD06NHIzc1VO2IjzRU5EZ1y/PhxfPrpp5g3799wuZzwJmTClTkE/ugUtaNFFGmJhSvrari7DYTxyHYsX7ka33zzDa6++mo88MAD6N27t9oRWeREWlNdXY1PPvkE//73f+D2uOFJ6AF37kXwRyWoHS2iSZMV7ozL4O56IUyVO7F2w0asWbMGV1xxJcaMeQB9+6r3+jsWOZFGNDQ04LPPPsNHH30Ml9sFT2I2XF0HQkZpcFErLTOY4e42EO4u/WA6sgPfbdqM9evX4brrrsO4ceOQlpYW+kghH5GI2sXn82Hp0qWY887fUHO8Gp6ELLh6XarNVQkjid50utD7w3R4G1asWoNVq1bhjjvuwC9/+UvExYXuv48ic5GEEDcLIXYLIYqFEP+rxDGJCCgrK8P48U/i1VdfRbXHgPo+t8CZcz1LPJzojXCnXYzaC+5CQ2JP5H3xBe6//5f49ttvQxYh4CtyIYQewFsAbgRwEMAGIcR8KeWOQI9N1Fn5fD7k5eXhnXfegRc6NPS4Fl57T85CCWPSZIUr6yfwpPSDv2QVXnrpJfz0p9fjqaeeRHx8fFDHVuLWyuUAiqWU+wBACPEJgJEAWOREHeByuTBlyhSsXLkS3vhMOLtfre11wTsZvzUR9X1uhenwFnzz7XJs274N0157DZmZmUEbU4lbK2kAyn60ffD0184ihBgnhCgUQhRWVVUpMCxReFDy3K6trcWzzz2HlStXwplxBRpybmCJa5FOB3e3gajvcyuO1tTi148/ju3btwdvOAWO0dTvevK8L0g5R0o5SEo5KDk5WYFhicKDUue23+/Hi5MmYceOnWjIHgJPan/eStE4vy0Jtb1vQZ1H4Pnnf4MjR4LzhLYSRX4QQMaPttMBRO7L8YiCZN68edj8/fdoyLwSXnu22nFIIdISi7peN8Hp9mDqn/4Ev9+v+BhK3CPfACBXCNEDQDmA/wEwWoHjnqe8vBx6xwk+wh3h9I5jKC8Ph2WUQsflcmHOnHfgjUuHJ6mX2nGCz+eGxWLBrbfeioULF6LO51Y7UVBJcwwaMi7H5u9XY926dbj66qsVPX7AV+RSSi+AJwAsBbATwGdSyuDdDCKKQLt374bL5YQnuXenuJ0ivG7ceuuteOKJJ3DLLbdAeCO7yAHAY88BdHps3rxZ8WMr8kCQlHIxgKBfJqelpeGwy8BFsyJc1K7FSEvronaMkDqzXLHPGr5LpSpJGkxYuHAhpJRYtGgRpKETfKCr08MflYCioj3KH1rxIxJRu6WmpgIAdO46lZOEiN4Ep9OJvLw8OJ1OQG9SO1HwSQm9px7dunVV/NAscqIwcGYFPX1NWSt7klbp6qsg3Q1BWS2RRU4UBux2O2688UaYK3dAOE+qHYeUJiWiytYjPiEBQ4cOVfzwLHKiMPHII4/AbDTCun8FcPo1aBQZTOWboKurwmOPPgqbzab48VnkRGEiKSkJEye+AH19Fax7vwb8PrUjkQKMR3bAfOgHjBgxAsOGDQvKGCxyojAyePBgPP/889CfKId1Tz7QCablRSwpYar4AZbSdbjmmmvw7LPPQgRpaimLnCjM3HLLLZgwYQKMdUcQvXsRhKtW7UjUXn4fLCWrYC7fiBtuuAGTJk2CwRC81z+wyInC0PDhw/Haa/8Hm3AjZucCzmbREOGqg233EhiP7sEDDzyAiRMnwmw2B3VMFjlRmLrkkksw569/RVZGN1j35MNUtgEIwjodpBx9TSlidv4HVl8tJk+ejAcffDBot1N+jEVOFMbS09Mx++23cdttt8F8eCtsuxdBOE+oHYvO5fPCfGANrHsKkJ2Zgb+98w6GDBkSsuFZ5ERhzmw247nnnsPkyZMRIxsQs2M+jFW7AXneatGkAl39McTsnA9T5S7cfffdmDXrLaSnp4c0A1++TKQRQ4YMQb9+/fDHqVOx+fvVMJwog7P7NZDGKLWjdU7SD9PhrTBXfI+E+AS88PJrGDRokCpRNFfkekd1p1rGVnf6KT+/JVblJKGjd1QD6FyLZrVVSkoKpk+bhn/961+Y8847MO74D+q7XwNffEbrP0yKEa5aWPevhK72MK699jo899yziItT74XYmirynJwctSOEXHHxqalnOdmdqdi6dMr/1m2l0+lw7733YtCgQXj55Sko2ZMPd0o/uDIGATpN/U9akwzH9sFaugZmox7P/O53GDZsWEg+0Gwxk6qjt9P48ePVjhByTz31FADgjTfeUDkJhZuePXvir3+djTlz5iAvLw/GusNwZF8Hf1SC2tEik88DS+k6GI/uQe++ffHSpEno2lX5lQw7gh92EmmY2WzG+PHjMXXqVMQavIjeuRCGo8Vqx2qV35oIqTdC6o3wxqTCb01UO1KLdA3HEbNrIYzHinHfffdh5owZYVPiAIucKCJcddVVeP+993BBv76I2r8C5pLVgD98X5fnyrwSPqsdPqsdDX1GwJV5pdqRmmU4thfROxcg1igxfdo0jB07NqhPaXYEi5woQtjtdrz++nSMGjUKpqrdsO1eAuGuVzuWdkk/zKXrELXvW/Tv1xfv/v1vuOSSS9RO1SQWOVEEMRgMeOSRR/Dyyy8jylOLmJ0LoKurVDuW9nhdsBYtg+nIDtx11134y+uvIykpSe1UzWKRE0WgwYMH4+23ZyElIRbRu5fAcGyf2pE0QzhPIGbXQpgclZgwYQLGjx8fdrdSzsUiJ4pQ2dnZmDPnr+jfvx+i9i2HqWIznwZthb72MGJ2LUKMQeIvr7+O4cOHqx2pTVjkRBEsLi4O06dNw9ChQ2Eu3wTzgdWA5MJbTTFU74e1aCm6dUnC7NlvY8CAAWpHajMWOVGEM5lMeOGFF3DffffBVFWEqOKv+Cq5cxgPb0fU3m/Qv29fvD1rFtLS0tSO1C4BFbkQ4m4hxHYhhF8Ioc4iA0TUKiEExo4di2eeeQbGEwdhK1oK4XGqHUt9UsJctgGWsvUYPHgwpk+fhthY7S2HEegV+TYAdwJYoUAWIgqykSNH4g9/+APMruN8+5DfD8v+FTAd3oqRI0di8uTJQX8BRLAEVORSyp1Syt1KhSGi4Bs8eDCmTZsGq86LmF2LoHMcUztS6Pk8sBbnw3hsLx566CE8/fTT0Ov1aqfqsJDdIxdCjBNCFAohCquqqkI1LFHQafHcvvDCC/HWzJlIjLEievcS6E8eUjtSyAhPA2y7v4Sx9hB++9vf4v7771d90atAtVrkQogCIcS2Jv6MbM9AUso5UspBUspBycnJHU9MFGa0em736NEDb789C+ndusK2Z1mnmGsunCcRvWsRzJ4TeOWVVzBixAi1Iymi1VnuUsqhoQhCRKGXkpKCWW/NxO9+/3ts27ocTk89PKnamXbXHrq6KkQX58NmMeLPf3od/fv3VzuSYjj9kKiTi4mJwbTXXsO1114HS9kGmEvXR9yDQ/qaUkQXfYnkxHi8PWtWRJU4EPj0w58JIQ4CuArAIiHEUmViEVEomc1mvPTSJNx1110wHdkOy95vwnr1xPYwVu2Gtfgr9MzOwtuz3kJGRuS9TSnQWSvzpJTpUkqzlLKLlPImpYIRUWjp9Xo88cQTeOyxx2A8XgJb0TLA61I7VsdJCVP5JlhKVuOyQZfhzTfegN1uVztVUPDWChE1EkLg3nvvxYsvvgijowrRWl0KV/phLlkNc8Vm3HzzzZg69Y+wWq1qpwoaFjkRneeGG27Aq6++iijZgOhdiyAaTqgdqe38XkQVfw3T0SLcd999mDBhQtivXhgoFjkRNenSSy/FjDffRKzFgJjdi6CrP6p2pNb53LDuyYehphRPPvkkxo4dq/k54m3BIieiZuXm5mLWWzORlBCL6KIvoa89onak5nldsBUthbGuEhMnTsSdd96pdqKQYZETUYvS09Px1syZ6NolGbY9y8LyKVDhaUB00ZcwOo9jypSXMXRo53r8hUVORK1KSUnBjDffREZ6N9iK88OrzL1O2IqWwuQ+iT9NnYqrr75a7UQhxyInojax2+148403kJ7WDbbiAuhrD6sdCfC6EF20FEZ3LaZOnYrLLrtM7USqYJETUZvFx8fjL6+/jq6pKbAVF6i7cqLPC9uefBhcJ/DKK1MwaFDnfSUCi5yI2sVut+Mvr7+OxLhY2PbkQzhPhj6E9CNq3zfQ1Vdh0osv4oorrgh9hjDCIieidktJScG0aa/BZtIjurggtE+ASgnzgXUw1JThmaefxnXXXRe6scMUi5yIOiQrKwt/fGUK9O5aWPd+A/hD81JnY+UOmKp2YdSoURg5sl2raUcsFjkRddhFF12E559/HvqTFTAf/C7o4+lPVsBS9h2uueYaPPzww0EfTytY5EQUkOHDh59eNXEHDMf2Bm0c4aqDbd9yZGZm4oUXXoBOx/o6g/8SRBSwxx57DBcMGADrgQ/1gcoAAAZeSURBVDXQNdQoP4DfB+u+b2Ax6PDKlCkRvQBWR7DIiShgBoMBk196CdE2K6z7liu+lrm5fCN0dVWYMOG3EbmeeKBY5ESkiKSkJLw48QUIRzXMZcrdL9efKIfp8DbcfvvtGDJkiGLHjSQsciJSzOWXX467774bpspd0NeUBX5ArxO2kpXIyMjE448/HvjxIhSLnIgUNXbsWHTPyoLtwOqA55dbDqyFzufCpEkvwmw2K5Qw8rDIiUhRZrMZL/z+9xBeJyyl6zt8HMPxAzBW78cDDzyA3NxcBRNGHhY5ESmuV69e+MUvfgHjsWLoT5S3/wBeN6yla5HdsydGjx6tfMAIwyIPcw6HA1u3bkVxcbHaUYja5f7770e3tDRYS9e2exaLuXwj4GnAb3/zm4h/TZsS+C/URjNmzFClTIuLiyGlxOOPP44+ffqEdOycnByMHz8+pGNS5DCZTPjN88/jmWeegenQVrjTLm7Tz+nqj8FUuRM/u/POkJ/zWsUr8jDmcDggpQQAuFwuNDQ0qJyIqH0uvvhiDBkyBJYjWyFcdWd9z29NhN+aePYPSImosvWIiY3Fgw8+GMKk2sYr8jZS48p0zJgxZ2273W7MmTMn5DmIAvHoo49i9erVMJdvgjP72savuzKvPG9fQ00pdLWH8fCzzyImJiaUMTUtoCtyIcT/CSF2CSG2CCHmCSHilQpGQElJSYvbRFqQmpqKO++8E8bqvdA1HG9+R+mHpWITuqWlYcSIEaELGAECvbWSD+ACKeWFAIoA/C7wSHRGVlZWi9tEWjFq1ChYzBaYKjY3u4+hugTCcRxjH3qIH3C2U0BFLqVcJqU883H0OgDpgUeiMyZOnNjiNpFWxMfH4447RsJ4vKTpNwpJCcuRbeiWlsYXRXSAkh92/grAkua+KYQYJ4QoFEIUVlVVKThs5MrJyWm8Cs/KykJOTo66gahJPLfb5uc//zn0Oh1MlTvP+56+7ghE/VGMHjUKer1ehXTa1mqRCyEKhBDbmvgz8kf7vADAC+DD5o4jpZwjpRwkpRyUnJysTPpOYOLEibDZbLwaD2M8t9smKSkJgwcPhrm6+Lx55cbKXYiyWjF06FCV0mlbqzeipJQt/ssKIR4AcCuAG+SZuXKkmJycHCxatEjtGESKuO2227B8+XIYjpfCa88+9UWfG8aaA7jp9ttgsVjUDahRgc5auRnABAC3SykdykQiokg1cOBAxMcnwHC8pPFrhuOlgN/Hq/EABHqPfCaAGAD5QojNQojZCmQiogil1+sxZMh1MJ082Hh7xVBzAIl2O/r166dyOu0KdNZKjpQyQ0o58PSfR5UKRkSR6fLLL4f0eaGvqwSkH6baw7jyiiv4Ds4AcLImEYXUwIEDodPpoD95CFJvgvS6cOmll6odS9P4/wKJKKSsVit69MiGvr4K+vpT0zX79++vciptY5ETUcj17dsHxoZq6OqPIjomFl26dFE7kqaxyIko5LKysiA9Thhqj6BndjaEEGpH0jQWORGFXGZmJgBA5zqJzMwMldNoH4uciEIuNTW1yb9Tx7DIiSjkfryUAZc1CByLnIhCLioqqvHvCQkJKiaJDCxyIlJVbGys2hE0j0VORKqKjo5WO4LmsciJSBVn1laJj+cbIgPFR/SJSBXTp09HfX09bDab2lE0j0VORKqwWCxcf1whvLVCRKRxLHIiIo1jkRMRaRyLnIhI41jkREQaxyInItI4FjkRkcYJKWXoBxWiCsCBkA+sXUkAjqodQkO6SylVWVKP53a78dxunybPbVWKnNpHCFEopRykdg4ipfHcVgZvrRARaRyLnIhI41jk2jBH7QBEQcJzWwG8R05EpHG8Iici0jgWORGRxrHIw5wQ4mYhxG4hRLEQ4n/VzkOkBJ7XyuI98jAmhNADKAJwI4CDADYAGCWl3KFqMKIA8LxWHq/Iw9vlAIqllPuklG4AnwAYqXImokDxvFYYizy8pQEo+9H2wdNfI9IyntcKY5GHN9HE13gvjLSO57XCWOTh7SCAjB9tpwOoUCkLkVJ4XiuMRR7eNgDIFUL0EEKYAPwPgPkqZyIKFM9rhRnUDkDNk1J6hRBPAFgKQA/gXSnldpVjEQWE57XyOP2QiEjjeGuFiEjjWORERBrHIici0jgWORGRxrHIiYg0jkVORKRxLHIiIo37/ydYtYTi5gFqAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 行，列，index为索引， sharey，sharex为共享xy轴\n",
    "_, axes = plt.subplots(1, 2, sharey=True, figsize=(6, 4))\n",
    "sns.boxplot(data=df[\"MEDV\"], ax=axes[0]);\n",
    "sns.violinplot(data=df[\"MEDV\"], ax=axes[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. 对两个连续型特征，可以用哪个函数得到这两个特征之间的相关性？根据代码运行结果，给出示例。 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "回忆：实际上有三种组合（数值-数值:相关矩阵或散点图-强相关降维，\n",
    "数值-类别:彩色/尺寸编码，类别-类别：编码，或列联表）\n",
    "依题，只考虑连续性特征"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## pandas读取的数据.corr() 或是通过循环"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'data_color' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-6-6c4b3dcda79e>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m      4\u001b[0m \u001b[0mplt\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0msubplots\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfigsize\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m13\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;36m9\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      5\u001b[0m \u001b[1;31m# sns.heatmap(data_color, annot=False)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 6\u001b[1;33m \u001b[0msns\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mheatmap\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdata_corr\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmask\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mdata_color\u001b[0m \u001b[1;33m<\u001b[0m \u001b[1;36m0.5\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcbar\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mFalse\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mannot\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mTrue\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;31m# mask表示掩膜， annot表示数值显示与否，\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      7\u001b[0m \u001b[1;31m#cbar即colorbar表示色度条显示与否\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      8\u001b[0m \u001b[0mplt\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mshow\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mNameError\u001b[0m: name 'data_color' is not defined"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAwIAAAIMCAYAAABGwRt8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAVxElEQVR4nO3dX4jn913v8de7WaNQawvuHpDsagJurWsR4hlCD72w0p7DJhe7N1WyULQSujcnilqEiFIlXtlyKAjxz6qlKtgYe6GLrORCIx7ElEzpOcGkBIaozRAha83JTWljznmfixl75kxmd77Z+c1mNu/HAxZ+39/vM795X3yY2ed8v7/fr7o7AADALG97swcAAABuPiEAAAADCQEAABhICAAAwEBCAAAABhICAAAw0L4hUFWfqaqXqurvr/F4VdWvV9VGVT1dVT+0+jEBAIBVWnJG4LNJzl7n8XuTnN7+dzHJbx58LAAA4DDtGwLd/TdJ/vU6S84n+YPe8mSSd1XVd61qQAAAYPVW8RqBO5K8sON4c/s+AADgiDq2gueoPe7rPRdWXczW5UN5+9vf/h/f8573rODbAwDATF/84hf/pbtP3MjXriIENpOc2nF8MsmLey3s7ktJLiXJ2tpar6+vr+DbAwDATFX1Tzf6tau4NOhykh/ffveg9yV5pbv/eQXPCwAAHJJ9zwhU1eeSfCDJ8araTPLLSb4lSbr7t5JcSXJfko0kX0vyk4c1LAAAsBr7hkB3X9jn8U7yX1c2EQAAcOh8sjAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADLQoBKrqbFU9V1UbVfXQHo9/d1U9UVVfqqqnq+q+1Y8KAACsyr4hUFW3JXkkyb1JziS5UFVndi37pSSPdffdSe5P8hurHhQAAFidJWcE7kmy0d3Pd/erSR5Ncn7Xmk7yHdu335nkxdWNCAAArNqSELgjyQs7jje379vpV5J8pKo2k1xJ8lN7PVFVXayq9apav3r16g2MCwAArMKSEKg97utdxxeSfLa7Tya5L8kfVtXrnru7L3X3WnevnThx4o1PCwAArMSSENhMcmrH8cm8/tKfB5I8liTd/XdJvi3J8VUMCAAArN6SEHgqyemququqbs/Wi4Ev71rzlSQfTJKq+v5shYBrfwAA4IjaNwS6+7UkDyZ5PMmXs/XuQM9U1cNVdW572ceTfKyq/meSzyX5aHfvvnwIAAA4Io4tWdTdV7L1IuCd931ix+1nk7x/taMBAACHxScLAwDAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAi0Kgqs5W1XNVtVFVD11jzY9V1bNV9UxV/dFqxwQAAFbp2H4Lquq2JI8k+c9JNpM8VVWXu/vZHWtOJ/mFJO/v7per6j8c1sAAAMDBLTkjcE+Sje5+vrtfTfJokvO71nwsySPd/XKSdPdLqx0TAABYpSUhcEeSF3Ycb27ft9O7k7y7qv62qp6sqrN7PVFVXayq9apav3r16o1NDAAAHNiSEKg97utdx8eSnE7ygSQXkvxuVb3rdV/Ufam717p77cSJE290VgAAYEWWhMBmklM7jk8meXGPNX/W3f/W3f+Q5LlshQEAAHAELQmBp5Kcrqq7qur2JPcnubxrzZ8m+ZEkqarj2bpU6PlVDgoAAKzOviHQ3a8leTDJ40m+nOSx7n6mqh6uqnPbyx5P8tWqejbJE0l+vru/elhDAwAAB1Pduy/3vznW1tZ6fX39TfneAADwVlBVX+zutRv5Wp8sDAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADLQqBqjpbVc9V1UZVPXSddR+uqq6qtdWNCAAArNq+IVBVtyV5JMm9Sc4kuVBVZ/ZY944kP53kC6seEgAAWK0lZwTuSbLR3c9396tJHk1yfo91v5rkk0m+vsL5AACAQ7AkBO5I8sKO483t+76pqu5Ocqq7//x6T1RVF6tqvarWr169+oaHBQAAVmNJCNQe9/U3H6x6W5JPJ/n4fk/U3Ze6e627106cOLF8SgAAYKWWhMBmklM7jk8meXHH8TuSvDfJX1fVPyZ5X5LLXjAMAABH15IQeCrJ6aq6q6puT3J/ksv//mB3v9Ldx7v7zu6+M8mTSc519/qhTAwAABzYviHQ3a8leTDJ40m+nOSx7n6mqh6uqnOHPSAAALB6x5Ys6u4rSa7suu8T11j7gYOPBQAAHCafLAwAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAy0Kgao6W1XPVdVGVT20x+M/V1XPVtXTVfWXVfU9qx8VAABYlX1DoKpuS/JIknuTnElyoarO7Fr2pSRr3f2DST6f5JOrHhQAAFidJWcE7kmy0d3Pd/erSR5Ncn7ngu5+oru/tn34ZJKTqx0TAABYpSUhcEeSF3Ycb27fdy0PJPmLvR6oqotVtV5V61evXl0+JQAAsFJLQqD2uK/3XFj1kSRrST611+Pdfam717p77cSJE8unBAAAVurYgjWbSU7tOD6Z5MXdi6rqQ0l+MckPd/c3VjMeAABwGJacEXgqyemququqbk9yf5LLOxdU1d1JfjvJue5+afVjAgAAq7RvCHT3a0keTPJ4ki8neay7n6mqh6vq3PayTyX59iR/UlX/o6ouX+PpAACAI2DJpUHp7itJruy67xM7bn9oxXMBAACHyCcLAwDAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYSAgAAMBAQgAAAAYSAgAAMJAQAACAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADCQEAAAgIGEAAAADCQEAABgICEAAAADCQEAABhICAAAwEBCAAAABhICAAAwkBAAAICBhAAAAAwkBAAAYCAhAAAAAwkBAAAYaFEIVNXZqnquqjaq6qE9Hv/Wqvrj7ce/UFV3rnpQAABgdfYNgaq6LckjSe5NcibJhao6s2vZA0le7u7vTfLpJL+26kEBAIDVWXJG4J4kG939fHe/muTRJOd3rTmf5Pe3b38+yQerqlY3JgAAsEpLQuCOJC/sON7cvm/PNd39WpJXknznKgYEAABW79iCNXv9Zb9vYE2q6mKSi9uH36iqv1/w/eFajif5lzd7CG5p9hAHZQ9xUPYQB/V9N/qFS0JgM8mpHccnk7x4jTWbVXUsyTuT/OvuJ+ruS0kuJUlVrXf32o0MDYk9xMHZQxyUPcRB2UMcVFWt3+jXLrk06Kkkp6vqrqq6Pcn9SS7vWnM5yU9s3/5wkr/q7tedEQAAAI6Gfc8IdPdrVfVgkseT3JbkM939TFU9nGS9uy8n+b0kf1hVG9k6E3D/YQ4NAAAczJJLg9LdV5Jc2XXfJ3bc/nqSH32D3/vSG1wPu9lDHJQ9xEHZQxyUPcRB3fAeKlfwAADAPIs+WRgAAHhrOfQQqKqzVfVcVW1U1UN7PP6tVfXH249/oaruPOyZuLUs2EM/V1XPVtXTVfWXVfU9b8acHF377aEd6z5cVV1V3sGD/8+SPVRVP7b9s+iZqvqjmz0jR9uC32XfXVVPVNWXtn+f3fdmzMnRVVWfqaqXrvX2+7Xl17f32NNV9UP7PeehhkBV3ZbkkST3JjmT5EJVndm17IEkL3f39yb5dJJfO8yZuLUs3ENfSrLW3T+YrU+2/uTNnZKjbOEeSlW9I8lPJ/nCzZ2Qo27JHqqq00l+Icn7u/sHkvzMTR+UI2vhz6FfSvJYd9+drTdd+Y2bOyW3gM8mOXudx+9Ncnr738Ukv7nfEx72GYF7kmx09/Pd/WqSR5Oc37XmfJLf3779+SQfrKq9PqCMmfbdQ939RHd/bfvwyWx91gX8uyU/h5LkV7MVkV+/mcNxS1iyhz6W5JHufjlJuvulmzwjR9uSPdRJvmP79jvz+s9sYrju/pvs8TldO5xP8ge95ckk76qq77recx52CNyR5IUdx5vb9+25prtfS/JKku885Lm4dSzZQzs9kOQvDnUibjX77qGqujvJqe7+85s5GLeMJT+H3p3k3VX1t1X1ZFVd7692zLNkD/1Kko9U1Wa23qnxp27OaLyFvNH/My17+9AD2Osv+7vfpmjJGuZavD+q6iNJ1pL88KFOxK3munuoqt6WrcsSP3qzBuKWs+Tn0LFsnY7/QLbOSv73qnpvd/+vQ56NW8OSPXQhyWe7+79V1X/K1uczvbe7/8/hj8dbxBv+P/VhnxHYTHJqx/HJvP5U1zfXVNWxbJ0Ou95pD2ZZsodSVR9K8otJznX3N27SbNwa9ttD70jy3iR/XVX/mOR9SS57wTA7LP1d9mfd/W/d/Q9JnstWGECybA89kOSxJOnuv0vybUmO35TpeKtY9H+mnQ47BJ5Kcrqq7qqq27P14pfLu9ZcTvIT27c/nOSv2ocb8P/su4e2L+v47WxFgOty2e26e6i7X+nu4919Z3ffma3XmZzr7vU3Z1yOoCW/y/40yY8kSVUdz9alQs/f1Ck5ypbsoa8k+WCSVNX3ZysErt7UKbnVXU7y49vvHvS+JK909z9f7wsO9dKg7n6tqh5M8niS25J8prufqaqHk6x39+Ukv5et018b2ToTcP9hzsStZeEe+lSSb0/yJ9uvM/9Kd59704bmSFm4h+CaFu6hx5P8l6p6Nsn/TvLz3f3VN29qjpKFe+jjSX6nqn42W5dzfNQfRtmpqj6XrcsPj2+/luSXk3xLknT3b2XrtSX3JdlI8rUkP7nvc9pjAAAwj08WBgCAgYQAAAAMJAQAAGAgIQAAAAMJAQAAGEgIAADAQEIAAAAGEgIAADDQ/wWkfsAnX49wwQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 936x648 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "cols = df.columns\n",
    "data_corr=df.corr()\n",
    "data_corr = data_corr.abs()   # 取绝对值\n",
    "plt.subplots(figsize=(13, 9))\n",
    "# sns.heatmap(data_color, annot=False)\n",
    "sns.heatmap(data_corr, mask=data_color < 0.5, cbar=False, annot=True)# mask表示掩膜， annot表示数值显示与否， \n",
    "#cbar即colorbar表示色度条显示与否\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "threshold = 0.5 # 通常认为大于0.5为较强相关性\n",
    "corr_list = []\n",
    "size = data_corr.shape[0]\n",
    "\n",
    "for i in range(0, size):\n",
    "    for j in range(i+1,size):\n",
    "        if (data_corr.iloc[i, j] >= threshold and data_corr.iloc[i,j] < 1) or (data_corr.iloc[i,j] < 0 and data_corr.iloc[i,j] <= -threshold):\n",
    "            corr_list.append([data_corr.iloc[i, j], i, j])  # iloc为pandas中基于position的索引\n",
    "s_corr_list = sorted(corr_list, key=lambda x: -abs(x[0]))\n",
    "for v,i,j in s_corr_list:\n",
    "    print (\"%s and %s = %.2f\" % (cols[i],cols[j],v))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. 如果发现特征之间有较强的相关性，在选择线性回归模型时应该采取什么措施。 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 应采取降维或者是加正则的方式。\n",
    "当特征共线性时，应采用L2正则项；如果输入特征多，有些特征与目标相关性弱时，应采用L1正则可以使有些系数为0；也可以采用L1与l2正则结合的方法，例如ElasticNet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. 当采用带正则的模型以及采用随机梯度下降优化算法时，需要对输入（连续型）特征进行去量纲预处理。课程代码给出了用标准化（StandardScaler）的结果，请改成最小最大缩放（MinMaxScaler）去量纲 ，并重新训练最小二乘线性回归、岭回归、和Lasso模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np  # 矩阵操作\n",
    "import pandas as pd # SQL数据处理\n",
    "df = pd.read_csv(\"boston_housing.csv\")\n",
    "# df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 最小二乘线性回归"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 从原始数据中分离输入特征x和输出y\n",
    "y = df['MEDV']\n",
    "X = df.drop('MEDV', axis = 1)\n",
    "\n",
    "# 尝试对y（房屋价格）做log变换，对log变换后的价格进行估计\n",
    "log_y = np.log1p(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 数据标准化\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "\n",
    "# 分别初始化对特征和目标值的标准化器\n",
    "ss_X = MinMaxScaler()\n",
    "ss_y = MinMaxScaler()\n",
    "\n",
    "ss_log_y = MinMaxScaler()\n",
    "\n",
    "# 分别对训练和测试数据的特征以及目标值进行标准化处理\n",
    "# 对训练数据，先调用fit方法训练模型，得到模型参数；然后对训练数据和测试数据进行transform\n",
    "X = ss_X.fit_transform(X)\n",
    "\n",
    "#对y做标准化不是必须\n",
    "#对y标准化的好处是不同问题的w差异不太大，同时正则参数的范围也有限\n",
    "y = ss_y.fit_transform(y.reshape(-1, 1))\n",
    "log_y = ss_y.fit_transform(log_y.reshape(-1, 1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>columns</th>\n",
       "      <th>coef_org</th>\n",
       "      <th>coef_log</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>20</td>\n",
       "      <td>RAD_24</td>\n",
       "      <td>0.505543</td>\n",
       "      <td>0.524147</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>RM</td>\n",
       "      <td>0.297984</td>\n",
       "      <td>0.176170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>18</td>\n",
       "      <td>RAD_7</td>\n",
       "      <td>0.184468</td>\n",
       "      <td>0.157229</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>ZN</td>\n",
       "      <td>0.147403</td>\n",
       "      <td>0.094180</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>19</td>\n",
       "      <td>RAD_8</td>\n",
       "      <td>0.147279</td>\n",
       "      <td>0.093588</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>RAD_3</td>\n",
       "      <td>0.134516</td>\n",
       "      <td>0.058141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>B</td>\n",
       "      <td>0.088869</td>\n",
       "      <td>0.100203</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>CHAS</td>\n",
       "      <td>0.074143</td>\n",
       "      <td>0.071025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>INDUS</td>\n",
       "      <td>0.017002</td>\n",
       "      <td>0.048219</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>AGE</td>\n",
       "      <td>-0.001742</td>\n",
       "      <td>0.004709</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>15</td>\n",
       "      <td>RAD_4</td>\n",
       "      <td>-0.042599</td>\n",
       "      <td>-0.095117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>16</td>\n",
       "      <td>RAD_5</td>\n",
       "      <td>-0.056407</td>\n",
       "      <td>-0.051296</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>CRIM</td>\n",
       "      <td>-0.104955</td>\n",
       "      <td>-0.199881</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>NOX</td>\n",
       "      <td>-0.176803</td>\n",
       "      <td>-0.175318</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>TAX</td>\n",
       "      <td>-0.178692</td>\n",
       "      <td>-0.236459</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>13</td>\n",
       "      <td>RAD_2</td>\n",
       "      <td>-0.200660</td>\n",
       "      <td>-0.157749</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>PTRATIO</td>\n",
       "      <td>-0.209666</td>\n",
       "      <td>-0.179215</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>17</td>\n",
       "      <td>RAD_6</td>\n",
       "      <td>-0.272976</td>\n",
       "      <td>-0.175632</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>DIS</td>\n",
       "      <td>-0.361874</td>\n",
       "      <td>-0.274028</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>RAD_1</td>\n",
       "      <td>-0.399165</td>\n",
       "      <td>-0.353311</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>LSTAT</td>\n",
       "      <td>-0.459577</td>\n",
       "      <td>-0.542390</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    columns  coef_org  coef_log\n",
       "20   RAD_24  0.505543  0.524147\n",
       "5        RM  0.297984  0.176170\n",
       "18    RAD_7  0.184468  0.157229\n",
       "1        ZN  0.147403  0.094180\n",
       "19    RAD_8  0.147279  0.093588\n",
       "14    RAD_3  0.134516  0.058141\n",
       "10        B  0.088869  0.100203\n",
       "3      CHAS  0.074143  0.071025\n",
       "2     INDUS  0.017002  0.048219\n",
       "6       AGE -0.001742  0.004709\n",
       "15    RAD_4 -0.042599 -0.095117\n",
       "16    RAD_5 -0.056407 -0.051296\n",
       "0      CRIM -0.104955 -0.199881\n",
       "4       NOX -0.176803 -0.175318\n",
       "8       TAX -0.178692 -0.236459\n",
       "13    RAD_2 -0.200660 -0.157749\n",
       "9   PTRATIO -0.209666 -0.179215\n",
       "17    RAD_6 -0.272976 -0.175632\n",
       "7       DIS -0.361874 -0.274028\n",
       "12    RAD_1 -0.399165 -0.353311\n",
       "11    LSTAT -0.459577 -0.542390"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 线性回归\n",
    "#class sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)\n",
    "from sklearn.linear_model import LinearRegression\n",
    "\n",
    "# 1.使用默认配置初始化学习器实例\n",
    "lr = LinearRegression()\n",
    "\n",
    "# 2.用训练数据训练模型参数\n",
    "lr.fit(X_train, y_train)\n",
    "\n",
    "# 3. 用训练好的模型对测试集进行预测\n",
    "y_test_pred_lr = lr.predict(X_test)\n",
    "y_train_pred_lr = lr.predict(X_train)\n",
    "\n",
    "\n",
    "# 看看各特征的权重系数，系数的绝对值大小可视为该特征的重要性\n",
    "fs = pd.DataFrame({\"columns\":list(feat_names), \"coef_org\":list((lr.coef_[0,:].T)),\"coef_log\":list((lr.coef_[1,:].T))})\n",
    "fs.sort_values(by=['coef_org'],ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The r2 score of LinearRegression on test is 0.7011422200388672\n",
      "The r2 score of LinearRegression on train is 0.7830578427150898\n"
     ]
    }
   ],
   "source": [
    "# 使用r2_score评价模型在测试集和训练集上的性能，并输出评估结果\n",
    "#测试集\n",
    "print ('The r2 score of LinearRegression on test is', r2_score(y_test, y_test_pred_lr))\n",
    "#训练集\n",
    "print ('The r2 score of LinearRegression on train is', r2_score(y_train, y_train_pred_lr))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 岭回归"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>columns</th>\n",
       "      <th>coef_org</th>\n",
       "      <th>coef_log</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>20</td>\n",
       "      <td>RAD_24</td>\n",
       "      <td>0.505543</td>\n",
       "      <td>0.524147</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>RM</td>\n",
       "      <td>0.297984</td>\n",
       "      <td>0.176170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>18</td>\n",
       "      <td>RAD_7</td>\n",
       "      <td>0.184468</td>\n",
       "      <td>0.157229</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>ZN</td>\n",
       "      <td>0.147403</td>\n",
       "      <td>0.094180</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>19</td>\n",
       "      <td>RAD_8</td>\n",
       "      <td>0.147279</td>\n",
       "      <td>0.093588</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>RAD_3</td>\n",
       "      <td>0.134516</td>\n",
       "      <td>0.058141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>B</td>\n",
       "      <td>0.088869</td>\n",
       "      <td>0.100203</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>CHAS</td>\n",
       "      <td>0.074143</td>\n",
       "      <td>0.071025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>INDUS</td>\n",
       "      <td>0.017002</td>\n",
       "      <td>0.048219</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>AGE</td>\n",
       "      <td>-0.001742</td>\n",
       "      <td>0.004709</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>15</td>\n",
       "      <td>RAD_4</td>\n",
       "      <td>-0.042599</td>\n",
       "      <td>-0.095117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>16</td>\n",
       "      <td>RAD_5</td>\n",
       "      <td>-0.056407</td>\n",
       "      <td>-0.051296</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>CRIM</td>\n",
       "      <td>-0.104955</td>\n",
       "      <td>-0.199881</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>NOX</td>\n",
       "      <td>-0.176803</td>\n",
       "      <td>-0.175318</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>TAX</td>\n",
       "      <td>-0.178692</td>\n",
       "      <td>-0.236459</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>13</td>\n",
       "      <td>RAD_2</td>\n",
       "      <td>-0.200660</td>\n",
       "      <td>-0.157749</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>PTRATIO</td>\n",
       "      <td>-0.209666</td>\n",
       "      <td>-0.179215</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>17</td>\n",
       "      <td>RAD_6</td>\n",
       "      <td>-0.272976</td>\n",
       "      <td>-0.175632</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>DIS</td>\n",
       "      <td>-0.361874</td>\n",
       "      <td>-0.274028</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>RAD_1</td>\n",
       "      <td>-0.399165</td>\n",
       "      <td>-0.353311</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>LSTAT</td>\n",
       "      <td>-0.459577</td>\n",
       "      <td>-0.542390</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    columns  coef_org  coef_log\n",
       "20   RAD_24  0.505543  0.524147\n",
       "5        RM  0.297984  0.176170\n",
       "18    RAD_7  0.184468  0.157229\n",
       "1        ZN  0.147403  0.094180\n",
       "19    RAD_8  0.147279  0.093588\n",
       "14    RAD_3  0.134516  0.058141\n",
       "10        B  0.088869  0.100203\n",
       "3      CHAS  0.074143  0.071025\n",
       "2     INDUS  0.017002  0.048219\n",
       "6       AGE -0.001742  0.004709\n",
       "15    RAD_4 -0.042599 -0.095117\n",
       "16    RAD_5 -0.056407 -0.051296\n",
       "0      CRIM -0.104955 -0.199881\n",
       "4       NOX -0.176803 -0.175318\n",
       "8       TAX -0.178692 -0.236459\n",
       "13    RAD_2 -0.200660 -0.157749\n",
       "9   PTRATIO -0.209666 -0.179215\n",
       "17    RAD_6 -0.272976 -0.175632\n",
       "7       DIS -0.361874 -0.274028\n",
       "12    RAD_1 -0.399165 -0.353311\n",
       "11    LSTAT -0.459577 -0.542390"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#岭回归／L2正则\n",
    "#class sklearn.linear_model.RidgeCV(alphas=(0.1, 1.0, 10.0), fit_intercept=True, \n",
    "#                                  normalize=False, scoring=None, cv=None, gcv_mode=None, \n",
    "#                                  store_cv_values=False)\n",
    "from sklearn.linear_model import  RidgeCV\n",
    "\n",
    "#1. 设置超参数（正则参数）范围\n",
    "alphas = [ 0.01, 0.1, 1, 10,100]\n",
    "#n_alphas = 20\n",
    "#alphas = np.logspace(-5,2,n_alphas)\n",
    "\n",
    "#2. 生成一个RidgeCV实例\n",
    "ridge = RidgeCV(alphas=alphas, store_cv_values=True)  \n",
    "\n",
    "#3. 模型训练\n",
    "ridge.fit(X_train, y_train)    \n",
    "\n",
    "#4. 预测\n",
    "y_test_pred_ridge = ridge.predict(X_test)\n",
    "y_train_pred_ridge = ridge.predict(X_train)\n",
    "# 看看各特征的权重系数，系数的绝对值大小可视为该特征的重要性\n",
    "fs = pd.DataFrame({\"columns\":list(feat_names), \"coef_org\":list((lr.coef_[0,:].T)),\"coef_log\":list((lr.coef_[1,:].T))})\n",
    "fs.sort_values(by=['coef_org'],ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The r2 score of RidgeCV on test is 0.7018965347172295\n",
      "The r2 score of RidgeCV on train is 0.7829300524909117\n"
     ]
    }
   ],
   "source": [
    "# 评估，使用r2_score评价模型在测试集和训练集上的性能\n",
    "print ('The r2 score of RidgeCV on test is', r2_score(y_test, y_test_pred_ridge))\n",
    "print ('The r2 score of RidgeCV on train is', r2_score(y_train, y_train_pred_ridge))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lasso模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "D:\\anaconda\\ana\\lib\\site-packages\\sklearn\\model_selection\\_split.py:1978: FutureWarning: The default value of cv will change from 3 to 5 in version 0.22. Specify it explicitly to silence this warning.\n",
      "  warnings.warn(CV_WARNING, FutureWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>columns</th>\n",
       "      <th>coef_org</th>\n",
       "      <th>coef_log</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>20</td>\n",
       "      <td>RAD_24</td>\n",
       "      <td>0.505543</td>\n",
       "      <td>0.524147</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>RM</td>\n",
       "      <td>0.297984</td>\n",
       "      <td>0.176170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>18</td>\n",
       "      <td>RAD_7</td>\n",
       "      <td>0.184468</td>\n",
       "      <td>0.157229</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>ZN</td>\n",
       "      <td>0.147403</td>\n",
       "      <td>0.094180</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>19</td>\n",
       "      <td>RAD_8</td>\n",
       "      <td>0.147279</td>\n",
       "      <td>0.093588</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>RAD_3</td>\n",
       "      <td>0.134516</td>\n",
       "      <td>0.058141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>B</td>\n",
       "      <td>0.088869</td>\n",
       "      <td>0.100203</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>CHAS</td>\n",
       "      <td>0.074143</td>\n",
       "      <td>0.071025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>INDUS</td>\n",
       "      <td>0.017002</td>\n",
       "      <td>0.048219</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>AGE</td>\n",
       "      <td>-0.001742</td>\n",
       "      <td>0.004709</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>15</td>\n",
       "      <td>RAD_4</td>\n",
       "      <td>-0.042599</td>\n",
       "      <td>-0.095117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>16</td>\n",
       "      <td>RAD_5</td>\n",
       "      <td>-0.056407</td>\n",
       "      <td>-0.051296</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>CRIM</td>\n",
       "      <td>-0.104955</td>\n",
       "      <td>-0.199881</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>NOX</td>\n",
       "      <td>-0.176803</td>\n",
       "      <td>-0.175318</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>TAX</td>\n",
       "      <td>-0.178692</td>\n",
       "      <td>-0.236459</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>13</td>\n",
       "      <td>RAD_2</td>\n",
       "      <td>-0.200660</td>\n",
       "      <td>-0.157749</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>PTRATIO</td>\n",
       "      <td>-0.209666</td>\n",
       "      <td>-0.179215</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>17</td>\n",
       "      <td>RAD_6</td>\n",
       "      <td>-0.272976</td>\n",
       "      <td>-0.175632</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>DIS</td>\n",
       "      <td>-0.361874</td>\n",
       "      <td>-0.274028</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>RAD_1</td>\n",
       "      <td>-0.399165</td>\n",
       "      <td>-0.353311</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>LSTAT</td>\n",
       "      <td>-0.459577</td>\n",
       "      <td>-0.542390</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    columns  coef_org  coef_log\n",
       "20   RAD_24  0.505543  0.524147\n",
       "5        RM  0.297984  0.176170\n",
       "18    RAD_7  0.184468  0.157229\n",
       "1        ZN  0.147403  0.094180\n",
       "19    RAD_8  0.147279  0.093588\n",
       "14    RAD_3  0.134516  0.058141\n",
       "10        B  0.088869  0.100203\n",
       "3      CHAS  0.074143  0.071025\n",
       "2     INDUS  0.017002  0.048219\n",
       "6       AGE -0.001742  0.004709\n",
       "15    RAD_4 -0.042599 -0.095117\n",
       "16    RAD_5 -0.056407 -0.051296\n",
       "0      CRIM -0.104955 -0.199881\n",
       "4       NOX -0.176803 -0.175318\n",
       "8       TAX -0.178692 -0.236459\n",
       "13    RAD_2 -0.200660 -0.157749\n",
       "9   PTRATIO -0.209666 -0.179215\n",
       "17    RAD_6 -0.272976 -0.175632\n",
       "7       DIS -0.361874 -0.274028\n",
       "12    RAD_1 -0.399165 -0.353311\n",
       "11    LSTAT -0.459577 -0.542390"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#### Lasso／L1正则\n",
    "# class sklearn.linear_model.LassoCV(eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, \n",
    "#                                    normalize=False, precompute=’auto’, max_iter=1000, \n",
    "#                                    tol=0.0001, copy_X=True, cv=None, verbose=False, n_jobs=1,\n",
    "#                                    positive=False, random_state=None, selection=’cyclic’)\n",
    "from sklearn.linear_model import LassoCV\n",
    "\n",
    "#1. 设置超参数搜索范围\n",
    "#alphas = [ 0.01, 0.1, 1, 10,100]\n",
    "\n",
    "#2. 生成学习器实例\n",
    "#lasso = LassoCV(alphas=alphas)\n",
    "\n",
    "#1. 设置超参数搜索范围\n",
    "#Lasso可以自动确定最大的alpha，所以另一种设置alpha的方式是设置最小的alpha值（eps） 和 超参数的数目（n_alphas），\n",
    "#然后LassoCV对最小值和最大值之间在log域上均匀取值n_alphas个\n",
    "# np.logspace(np.log10(alpha_max * eps), np.log10(alpha_max),num=n_alphas)[::-1]\n",
    "\n",
    "#2 生成LassoCV实例（默认超参数搜索范围）\n",
    "lasso = MultiTaskLassoCV()  \n",
    "\n",
    "#3. 训练（内含CV）\n",
    "lasso.fit(X_train, y_train)  \n",
    "\n",
    "#4. 测试\n",
    "y_test_pred_lasso = lasso.predict(X_test)\n",
    "y_train_pred_lasso = lasso.predict(X_train)\n",
    "# 看看各特征的权重系数，系数的绝对值大小可视为该特征的重要性\n",
    "fs = pd.DataFrame({\"columns\":list(feat_names), \"coef_org\":list((lr.coef_[0,:].T)),\"coef_log\":list((lr.coef_[1,:].T))})\n",
    "fs.sort_values(by=['coef_org'],ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The r2 score of LassoCV on test is 0.7015071438181055\n",
      "The r2 score of LassoCV on train is 0.7828893968548662\n"
     ]
    }
   ],
   "source": [
    "# 评估，使用r2_score评价模型在测试集和训练集上的性能\n",
    "print ('The r2 score of LassoCV on test is', r2_score(y_test, y_test_pred_lasso))\n",
    "print ('The r2 score of LassoCV on train is', r2_score(y_train, y_train_pred_lasso))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5. 代码中给出了岭回归（RidgeCV）和Lasso（LassoCV）超参数（alpha_）调优的过程，请结合两个最佳模型以及最小二乘线性回归模型的结果，给出什么场合应该用岭回归，什么场合用Lasso，什么场合用最小二乘。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 方法的选择主要依据跟方法的优缺点，尽量做到取长补短。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "岭回归（RidgeCV）:L2损失。会对系数进行收缩。优：函数处处可导方便计算 缺：对噪声敏感。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lasso:L1损失。可以将一些系数降维。优：对噪声不敏感 缺：在0处不可导"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Huber损失可综合两点。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最小二乘法即不加损失函数，用于噪声小且数据间相关性不大或者不需要对数据进行降维的情况"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 综上，对噪声敏感程度：最小二乘>岭回归>Lasso。对系数的收缩：Lasso>岭回归>最小二乘。计算的简便性：最小二乘>岭回归>Lasso。选择时做到取长补短即可。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
